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(1) Real Party In Interest 

The real parties in interest in the subject 
application are Sony Electronic Inc. and Sony Corporation. 

(2) Related Appeals and Interferences 

No related appeals or interferences are known to 

Appellant. 



(3) Status of Claims 

Claims 1-25 were submitted for examination in the 
application filed on October 19, 2000. 

Claims 26-56 were added by amendment. 

Claims 1, 5-10, 15-17, and 26 were amended. 

Claims 18-25, 31 and 45-56 were canceled. 

Claims 1-17, 26-30 and 32-44 are pending. 

Claims 1-17, 26-30 and 32-44 are appealed. 

Claim 17 stands rejected tinder 35 U.S.C. 102(e) 
as being anticipated by U.S. Patent No. 6,408,272 Bl {White 
et al) . 

Claims 1-16, 26-30 and 32-44 stand rejected under 
35 U.S.C. 103(a) as being unpatentable over U.S. Patent No. 
6,324,512 Bl (Jungua et al , ) in view of Hands free 
Continuous Speech Recognition in Noisy Environment Using a 
Four Microphone Array (Giuliani et al.) further in view of 
U.S. Patent No. 6,408,272 Bl (White et al.). 
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( 4) Status of Amendments 

No amendments have been filed subsequent to the 
final rejection. 

(5) Summary of Claimed Subject Matter 

Independent Claims 1, 6, 9, 10, and 26 are 
directed to a natural language interface control system 102 
for controlling a plurality of devices 114. The system 
includes a micro-phone or micro-phone array 108 that is 
coupled to a feature extraction module 202. The feature 
extraction module 202 is coupled- to the speech recognition 
module 206. The speech recognition module 206 is coupled 
to a natural language interface module 222. The natural 
language interface module 222 is coupled to a device 
interface 210 and is utilized to operate a plurality of 
devices 114 coupled to the device interface 210. Claims 1, 
6, and 26 state that the natural language interface 
abstracts each of the devices into a respective one of a 
plurality of different grammars and a respective one of a 
plurality of lexica corresponding to each of the plurality 
of devices. Claim 9 includes a grammar module 218 for 
storing different grammars for each of the plurality of 
devices. Claim 10 includes an acoustic model module 220 
for storing different acoustic models for each of the 
plurality of devices. See Fig. 2 for one embodiment of the 
invention . 

Independent claims 7 and 8 are directed to a 
natural language interface control system 102 for 
controlling a plurality of devices 114. The system 
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includes a micro-phone or micro-phone array 108 that is 
coupled to a feature extraction module 2 02. The feature 
extraction module 202 is coupled to the speech recognition 
module 206. The speech recognition module 206 is coupled 
to a natural language interface module 222. The natural 
language interface module 222 is coupled to a device 
interface 210 and is utilized to operate a plurality of 
devices 114 coupled to the device interface 210. Claim 7 
requires that the natural language interface module search 
for non-prompted open ended user requests upon the receipt 
and recognition of an attention word. Claim 8 requires 
that the natural language interface module context switch 
grammars, acoustic models, and lexica upon receipt and 
recognition of an attention word. See Fig. 2 for one 
embodiment of the invention. 

Independent claim 17 is a method of speech 
recognition. The method includes searching for an 
attention word based upon a first set of models, grammars 
and lexica and switching, upon finding the attention word 
to a second context including a second set of models, 
grammars and lexica, to search for an open-ended user 
request . 

(6) Grounds of Rejection to be Reviewed 

The following issues are presented for review: 

Issue 1: whether independent claim 17 is 
anticipated by U.S. Patent No. 6,408,272 {White efc al.) ; 
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Issue 2: whether claims 1-6, 11-16, 26-30 and 32- 
44 are patentable over U.S. Patent No. 6,324,512 Bl {Junqua 
et al.) in view of Hands free Continuous Speech Recognition 
in Noisy Environment Using a Four Microphone Array 
(Giuliani et al.) further in view of U.S. Patent No. 
6,408,272 Bl (White et al.); and 

Issue 3: whether claim 7 is patentable over U.S. 
Patent No. 6,324,512 Bl (Junqua et al.) in view of Hands 
free Continuous Speech Recognition in Noisy Environment 
Using a Four Microphone Array (Giuliani et al . ) further in 
view of U.S. Patent No. 6,408,272 Bl (White et al.). 

Issue 4: whether claim 8 is patentable over U.S. 
Patent No. 6,324,512 Bl {Junqua et al.) in view of Hands 
free Continuous Speech Recognition in Noisy Environment 
Using a Four Microphone Array (Giuliani et al.) further in 
view of U.S. Patent No. 6,408,272 Bl (White et al.). 

Issue 5: whether claims 9 and 10 are patentable 
over U.S. Patent No. 6,324,512 Bl (Junqua et al.) in view 
of Hands free Continuous Speech Recognition in Noisy 
Environment Using a Four Microphone Array (Giuliani et al.) 
further in view of U.S. Patent No. 6,408,272 Bl (White et 
al.) . 

(7) Argument 

Appellant submits that the claims of Group I, II, 
III, IV, and V stand or fall separately from any of the 
claims from each of the other groups. Appellant argues 
below under separate issues why the claims of the each 
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group are believed to be separately patentable. As 
referred to herein the claims are divided into the 
following groups: 

Group I: claims 1-6 r 11-16, and 26-30 and 

32-44; 

Group II: claim 7; 

Group III: claim 8; 

Group IV: claims 9 and 10; and 

Group V: claim 17. 

Issue It independent claim 17 (Group V) is not 
anticipated by White et al. 

The final rejection errs in stating that claim 17 
(Group V) is anticipated by White et al . 

M.P.E.P. § 2131 states that * [a] claim is 
anticipated only if each and every element as set forth in 
the claim is found, either expressly or inherently 
described, in a single prior art reference." 

White et al. describe a distributed voice user 
interface system. The system includes a local device that 
receives speech input (e.g., a command) issued from a user. 
The local device performs preliminary processing of the 
speech input and determines whether it can respond to the 
command by itself. If not, the local device initiates 
communication with a remote system for further processing 
of the speech input. 

The final rejection cites Column 6, lines 31-55 
of White et al. as teaching the elements of claim 17. This 
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section of White et al. teaches a local device that has 
limited voice recognition capabilities, however, is capable 
of "word spotting" by scanning speech for the occurrence of 
one or more "keywords". Because the local device has a 
limited vocabulary (e.g., less than 100 words) it is only 
capable of responding to relatively simple commands, 
instructions, directions or requests from a user. When the 
local device does not recognize any of the keywords it 
sends the speech over a network to a remote device that has 
more extensive speech recognition capabilities. 

In contrast, Applicants' independent claim 17 
recites w searching for an attention word based on a first 
context based on a first context including a first set of 
models, grammars, and lexica" and n switching, upon finding 
the attention word, to a second context to search for an 
open-ended user request, wherein the second context 
includes a second set of models, grammars, and lexicons." 
As stated in Applicants' specification at page 10, lines 1- 
30, the attention word notifies the Natural Language 
Interface Controller System (NLICS) that following the 
attention word, a request will arrive. As such, the 
microphone arrays employed by the NLICS only have to search 
for the attention word or words within the physical space 
defined by the microphone arrays. For example, if the 
attention word is programmed as "Mona", then the user's 
request becomes *Mona, I wanna watch TV. " Furthermore, 
individual users may have separate attention words specific 
to that user. For example, within a household, a first 
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user's attention word is w Mona ff while a second user's 
attention word is "Thor" . When the NLICS hears the 
attention word "Mona", the system assumes that the first 
user is issuing the command, and so the 1STLICS will load the 
grammars and acoustic models corresponding to that user 
(context switching) . 

It appears the final rejection has equated the 
"keywords" of White et al . , to "searching for an attention 
word based on a first context including a first set of 
models, grammars, and lexica, " as claimed by Applicant. 
However, when a "keyword" is found in White et al. the 
local device will perform the command that was received 
from the user. In contrast, the "attention word", 
functions, for example, to identify the user, to avoid 
false detections of requests and to distinguish between 
regular conversation and background noise. Thus, the 
"keywords" of White et al- are not the same as an 
"attention word" as recited in claim 17 . The "keywords" of 
White et al. represent the entire vocabulary of commands 
for a local device where the "attention word" of Applicants 
claim is used to notify the Natural Language Interface 
Controller System (NLICS) that, for example, following the 
attention word, a request will arrive. This provides for 
an efficient voice recognition system. Therefore, White et 
al. does not teach or suggest "searching for an attention 
word based on a first context including a first set of 
models, grammars, and lexica." 
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On page 3 of the final rejection, the Examiner 
states that white et al . at least recognizes a "wake up" 
command and points to Column 13, line 60 through Column 14, 
line 10 of "White et al. to support the position that the 
system searches for a keyword and utilizes remote resources 
to perform the needs that are associated with the keyword 
upon finding the keyword/attention word. 

First, as explained above, while White et al- may 
search for "keywords", the system of White et al . only 
utilizes the remote resource when a word or phrase can not 
be identified by the local resources. Thus, the system of 
White et al. does not perform the step of * switching, upon 
finding the attention word, to a second context to search 
for an open-ended user request, wherein second context 
includes a second set of models, grammar and lexicons, " as 
is claimed by Applicants. The only time the system of 
White et al. switches to the remote resources is when a 
phrase is not identified. 

Additionally, the "wake up" command of White et 
al, is only one of many types of triggers that can activate 
the system {e.g., a manual input device, time lapse or 
voice activation). However, White et al . does not state 
that there is a change in context that includes a second 
set of models, grammars and lexicons that occurs upon 
receipt of the "wake up" command. There is no indication 
that the set of models, grammars and lexicons are any 
different then under normal operation. That is, simply by 
having the n wake up" command does not teach or suggest the 
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Additionally, the remote voice recognition system 
of White et al. is only utilized when the local device does 
not recognize the command issued by a user. It appears the 
final rejection has equated sending the recorded voice 
command to the remote device of White et al. for further 
processing to Applicants' claimed "switching, upon finding 
the attention word , to a second context." As described 
above, the system of White et al. only uses the remote 
device (i.e., switches to a different context) when the 
command is not understood , not when the system does find a 
"keyword" and is capable of responding to the request or 
command without utilizing the remote system. 

Thus, White et al. do not disclose " switching, 
upon finding the attention word, to a second context to 
search for an open-ended user request, wherein second 
context includes a second set of models, grammar and 
lexicons, " as is claimed by Applicants. As described 
above, this provides for an efficient method of voice 
recognition including features not taught or suggested by 
White et al. That is, White et al. only switches to 
utilizing the remote system when a command is not 
understood , not "upon finding the attention word, " as 
claimed by Applicants. Therefore, White et al. do not 
anticipate Applicants' claim 17 because not each and every 
element as set forth in the claim is found, either 
expressly or inherently described, in the teachings of 
White et al. 
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step of * switching, upon finding the attention word, to a 
second context to search for an open-ended user request, 
wherein second context includes a second set of models, 
grammar and lexicons. n 

Thus, for all of the reasons stated above, 
Applicants respectfully submit the rejection errs in 
finding claim 17 anticipated by White et al. 

Issue 2s independent claims 1-6, 11-16, 26-30 and 
32-44 (Group I) ore patentable over Jiinqua et al. in view 
of Giuliani et al. and further in view of White et al. 

The final rejection errs in stating that the 
claims of Group I are obvious in view of combination of 
cited prior art. 

M.P.E.P. 2143 sets forth the requirements for a 
prima facie case of obviousness: 

"To establish a prima facie case of obviousness, 
'three basic criteria must be met- First, there 
must be some suggestion or motivation, either in 
the references themselves or the knowledge 
generally available to one of ordinary skill in 
the art, to modify the reference or to combine 
reference teachings. Second, there must be a 
reasonable expectation of success. Finally, the 
prior art reference (or references when combined) 
must teach or suggest all of the claimed 
limi tat ions (emphasis added)." 

Junqrua et al. disclose a voice recognition system 
where users can control a television and/or recorder. The 
system is used to hold a natural language dialog with 
users . 
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Giuliani et al . describe enhancement techniques 
for speaker- independent continuous speech recognition. 
Such techniques are used for recognition improvement of 
cleanly input speech or for speech generated in noisy 
conditions. These techniques involve acquiring a signal 
through an array of microphones , compensating for a 
corresponding time delay, enhancing the acquired signal by 
a spectrum weighting process, parsing the enhanced signal 
by means of a digital filter, and matching segments of the 
parsed signal to various hidden Markov models. 

Claim 1 recites in part "wherein the natural 
language interface module abstracts each of the plurality 
of devices into a respective one of the different grammars 
and a respective one of a plurality of lexica corresponding 
to each of the plurality of devices." The rejection states 
that column 2, line 2 through Column 3, line 35 of Junqua 
et al. shows that Junqua et al. teaches the above recited 
element of claim 1. 

The section cited by the Examiner states that the 
system of Junqua et al. includes a parser that supplies its 
output to a unified access controller module used to send 
commands to a digital tuner 40 or recorder 44. The parser 
is a goal-oriented parser that has a pre-defined database 
of grammars stored within it. If the unified access 
controller does not understand a command, using its dialog 
manager, the unified access controller prompts the user for 
additional information. If the response from a user is 
sufficiently refined to constitute a command, the unified 
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access controller sends a command to the television. This 
section of Junqua et al, teaches having a pre-defined 
database of grammars and a system that can prompt a user 
for additional information if a command is not understood. 
However, this section of Junqua et al. does not teach or 
suggest "wherein the natural language interface abstracts 
each of the plurality of devices into a respective one of a 
plurality of grammars and a respective one of a plurality 
of lexica corresponding to each of the plurality of 
devices," such as is claimed by Applicant. 

The rejection states on page 4 that "the grammar 
is necessarily specific to each unit wherein the recorder 
is associated with specific recording grammar, and the 
tuner is associated with channel selection grammar." This, 
however, is a conclusion by the Examiner that is not 
supported by the specification of Junqua et al. There is 
no reason why the unified access controller module must 
abstracts each of the plurality of devices into a 
respective one of a plurality of grammars and a respective 
one of a plurality of lexica corresponding to each of the 
plurality of devices. Furthermore, while the Examiner 
makes this assumption, Junqua et al. does not teach or 
suggest that this is how their unified access controller 
module functions. 

As described in Applicant's specification at the 
paragraph beginning on page 10, line 30 Mo]ne feature that 
enables the NLICS 102 to function efficiently is that each 
of the devices 114 coupled to the NLICS 102 are abstracted 
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into a separate device abstraction such that separate 
grammars and lexicons are stored for each of the devices 
114. For example, as the natural language interface module 
determines that the request is for the DVD player, a 
grammar and lexicon specific to that particular context...is 
used to aid in the processing of arriving acoustic data 
within the speech recognition module." Junqua et al. does 
not teach or suggest a system with increased efficiency 
having a natural language interface module that "abstracts 
each of the plurality of devices into a respective one of a 
plurality of grammars and a respective one of a plurality 
of lexica corresponding to each of the plurality of 
devices," such as is claimed by Applicant. 

Further, Applicant submits that neither Giuliani 
et al. nor White et al . teach or suggest a system "wherein 
the natural language interface abstracts each of the 
plurality of devices into a respective one of a plurality 
of grammars and a respective one of a plurality of lexica 
corresponding to each of the plurality of devices, " such as 
is claimed by Applicant. Thus, the final rejection does 
not establish a prima facie case of obviousness for the 
claims of Group I . 

Therefore, Junqua et al., Giuliani et al., and 
White et al. do not, individually or in combination, teach 
or suggest all of the claim limitations of independent 
claims 1, 6 and 26. Thus, Applicant respectfully submits 
the rejection errs in the rejection of the claims of Group 
I as the rejection fails to present a prima facie case of 
obviousness . 
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Issue 3: independent claim 7 (Group XX) is 
patentable over JUngua et al. in view of Giuliani et al. 
and further in view of White et al. 

The final rejection errs in stating that the 
claims of Group II are obvious in view of combination of 
cited prior art. 

Independent claim 7 recites "wherein the natural 
language interface module searches for the non-prornpted, 
open-ended, natural language requests upon the receipt and 
recognition of an attention word." The final rejection has 
cited various sections of White et al. and it appears the 
final rejection is equating the w word spotting" and 
"keywords" of White et al. to Applicant's claimed 
"attention word. " As described above with reference Issue 
I, White et al. does not teach or suggest an "attention 
word" as claimed by Applicant. 

Additionally, White et al . does not teach or 
suggest a system including a natural language interface 
that "searches for the non-prompted, open-ended, natural 
language requests upon the receipt and recognition of an 
attention word." The system is White et al. is designed to 
recognize speech only at an elementary level, for example, 
by keyword searching. For this purpose, the speech 
recognition engine may comprise a keyword search component 
which is able to identify and recognize a limited number of 
keywords (See White et al . column 12, lines 14-17). If a 
keyword is recognized, a command is executed, however, the 
recognition of the keyword does not prompt the system of 
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White et al. to search "for the non-prompted, open-ended, 
natural language requests upon the receipt and recognition 
of an attention word , " such as is claimed by Applicant. 

Thus, White et al. does not teach or suggest a 
system "wherein the natural language interface module 
searches for the non-prompted, open-ended, natural language 
requests upon the receipt and recognition of an attention 
word." Applicant further submits that neither Junqua et 
al. nor Giuliani et al. teach this claimed element. 

Therefore, JunQua et al., Giuliani et al., and 
White et al. do not, individually or in combination, teach 
or suggest all of the elements of claim 7. Thus, Applicant 
respectfully submits the rejection errs in the rejection of 
the claim of Group II as the rejection fails to present a 
prima facie case of obviousness . 

Issue 4: independent claim 8 (Group III) is 
patentable over junqua et al* in view of Giuliani et al. 
and further in view of White et al. 

The final rejection errs in stating that the 
claims of Group III are obvious in view of combination of 
cited prior art. 

Independent claim 8 recites "wherein the natural 
language interface module context switches grammars, 
acoustic models, and lexica upon receipt and recognition of 
an attention word-" The Examiner has cited various 
sections of White efc al. and it appears the Examiner is 
equating the "word spotting" and "keywords" of White et al , 
to Applicant's claimed "attention word. 17 As described 
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above with reference Issue I, White et al. does not teach 
or suggest an "attention word" as claimed by Applicant - 
More specifically, White et al. does not switch grammars, 
acoustic models, and lexica upon receipt of a keyword, but 
performs a command upon recognition of a keyword. 

Thus, White et al. does not teach or suggest a 
system "wherein the natural language interface module 
context switches grammars, acoustic models, and lexica upon 
receipt and recognition of an attention word, " as is 
claimed by Applicant. Applicant further submits that 
neither Junqua et al. nor Giuliani et al. teach the claimed 
element. 

Therefore, Junqua et al., Giuliani et al,, and 
White et al. do not, individually or in combination, teach 
or suggest all of the elements of claim 8, Thus, Applicant 
respectfully submits the rejection errs in the rejection of 
the claim of Group III as the rejection fails to present a 
prima facie case of obviousness. 

Issue 5 s independent claims 9 and 10 (Group IV) 
are patentable over Junqua et al* in view of Giuliani et 
al. and further in view of White et al. 

The final rejection errs in stating that the 
claims of Group IV are obvious in view of combination of 
cited prior art. 

Independent claim 9 recites "a grammar module for 
storing different grammars for each of the plurality of 
devices.'' Independent claim 10 recites tt an acoustic model 
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module for storing different acoustic models for each of 
the plurality of devices." As described above with 
reference to the rejection of the claims of Group I, Junqua 
et al. does not teach of suggest having different grammars 
or models for separate devices under control of the natural 
language interface control system. Having different models 
or grammars for each of the plurality of devices provides 
for an efficient speech recognition system that is able to 
context switch between grammars and models that are for 
each of the plurality of devices controlled by the system. 

Similarly to above, the rejection appears to rely 
on the fact that the grammars or models are necessarily 
specific to each unit. This, however, is a conclusion by 
the Examiner that is not supported by the specification of 
Junqua et al. There is no reason why the system of Junqua 
et al. must have different grammars and models for each of 
the plurality of devices. Furthermore, while the Examiner 
makes this assumption, Junqua et al. does not teach or 
suggest that this is how their unified access controller 
module functions. 

Therefore, Junqua et al., Giuliani et al., and 
White et al. do not, individually or in combination, teach 
or suggest all of the elements of claim 9 or 10. Thus, 
Applicant respectfully submits the rejection errs in the 
rejection of the claim of Group IV as the rejection fails 
to present a prima facie case of obviousness. 
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CONCLUSION 



Appellant submits that the rejection errs in the 



rejection of the claims of Groups I, II, III, IV, and V; 
that the claims of Groups I, II, III, and IV are not 
rendered obvious by the combination of the cited references 
and the claim of Group V is not anticipated by the cited 
reference. 

Appellant respectfully requests a reversal of the 
final rejection. 



Dated: January 17, 2006 



Address all correspondence to: 

FITCH, EVEN, TABIN & FLANNERY 
120 South LaSalle Street, Ste. 1600 
Chicago, IL 60603 
(858) 552-1311 



Respectfully submitted, 




Martin R. Bader 
Reg. No. 54,736 
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(8) Appendix 

Provided is a complete listing of all the pending 
claims involved with this appeal: 

1 . A natural language interface control system 
for operating a plurality of devices comprising: 

a 3 dimensional microphone array; 

a feature extraction module coupled to the first 
microphone array; 

a speech recognition module coupled to the 
feature extraction module, wherein the speech recognition 
module utilizes hidden Markov models and can switch between 
different acoustic models and different grammars, wherein 
at least one of the different acoustic models and at least 
one of the different grammars is downloaded over a network; 

a natural language interface module coupled to 
the speech recognition module; and 

a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user^ 

wherein the natural language interface module 
abs tracts each of the plurality of devices into a 
respective one of the different grammars and a respective 
one of a plurality of lexica corresponding to each of the 
plurality of devices . 
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2. The system of Claim 1 further 

comprising the plurality of devices coupled to the natural 
language interface module . 

3. The system of Claim 1 wherein the speech 
recognition module utilizes an N gram grammar. 

4. The system of Claim 1 wherein the natural 
language interface module utilizes a probabilistic context 
free grammar. 

5- The system of Claim 1 
wherein the microphone array comprises said 3 dimensional 
microphone array further comprising a planar microphone 
array and at least one linear microphone array located in a 
different plane in space. 

6. A natural language interface control system 
for operating a plurality of devices comprising: 
a 3 dimensional microphone array; 
a feature extraction module coupled to the first 

microphone array; 

a speech recognition module coupled to the 
feature extraction module, wherein the speech recognition 
module utilizes hidden Markov models and can switch between 
different acoustic models and different grammars ; 

a natural language interface module coupled to 
the speech recognition module; and 
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a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user; 

wherein the natural language interface abstracts 
each of the plurality of devices into a respective one of a 
plurality of grammars and a respective one of a plurality 
of lexica corresponding to each of the plurality of 
devices . 

7 . A natural language interface control system 
for operating a plurality of devices comprising: 
a 3 dimensional microphone array; 
a feature extraction module coupled to the first 

microphone array; 

a speech recognition module coupled to the 
feature extraction module/ wherein the speech recognition 
module utilizes hidden Markov models and can switch between 
different acoustic models and different grammars; 

a natural language interface module coupled to 
the speech recognition module; and 

a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user; 

wherein the natural language interface module 
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searches for the non-prompted, open-ended user requests 
upon the receipt and recognition of an attention word. 

8. A natural language interface control system 
for operating a plurality of devices comprising: 

a 3 dimensional microphone array; 

a feature extraction module coupled to the first 
microphone array; 

a speech recognition module coupled to the 
feature extraction module, wherein the speech recognition 
module utilizes hidden Markov models and can switch between 
different acoustic models and different grammars; 

a natural language interface module coupled to 
the speech recognition module; and 

a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user; 

wherein the natural language interface module 
context switches grammars, acoustic models, and lexica upon 
receipt and recognition of an attention word. 

9, A natural language interface control system 
for operating a plurality of devices comprising: 

a 3 dimensional microphone array; 
a feature extraction module coupled to the first 
microphone array; 
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a speech recognition module coupled to the 
feature extraction module, wherein the speech recognition 
module utilizes hidden Markov models and can switch between 
different acoustic models and different grammars; 

a natural language interface module coupled to 
the speech recognition module; 

a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user; and 

a grammar module for storing different grammars 
for each of the plurality of devices. 

10. A natural language interface control system 
for operating a plurality of devices comprising: 
a 3 dimensional microphone array; 
a feature extraction module coupled to the first 

microphone array; 

a speech recognition module coupled to the 
feature extraction module, wherein the speech recognition 
module utilizes hidden Markov models and can switch between 
different acoustic models and different grammars; 

a natural language interface module coupled to 
the speech recognition module; 

a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
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coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user; and 

an acoustic model module for storing different 
acoustic models for each of the plurality of devices. 

11. The system of Claim 1 wherein the device 
interface comprises a wireless device interface. 

12 . The system of Claim 1 further comprising an 
external network interface coupled to the natural language 
interface control system. 

13. The system of Claim 1 further comprising a 
remote unit containing a first microphone array, the 
feature extraction module, the speech recognition module, 
and the natural language interface module, wherein said 3 
dimensional microphone array includes the first microphone 
array . 

14. The system of Claim 13 further comprising a 
base unit coupled to the remote unit. 

15. The system of Claim 14 wherein the base unit 
includes a second microphone array, wherein said 3 
dimensional microphone array includes the second microphone 
array . 

16. The system of Claim 15 wherein the first 



PACE 28/32 ' RCVD AT 1/17/2006 8:25:28 PM [Eastern Standard Time] ■ SVR:USPTO-EFXRF-6/29 • DNIS:2738300 * CSID: 8585520095 * DURATION (mm-ss):09-02 



01/17/06 17:32 FAX 8585520095 @029 



App. No- 09/692,846 
Appeal Brief 
Page 26 

microphone array and the second microphone array implement 
said 3 dimensional microphone array. 

17, A method of speech recognition comprising: 
searching for an attention word based on a first 

context including a first set of models, grammars, and 

lexica; and 

switching, upon finding the attention word, to a 
second context to search for an open-ended user request, 
wherein the second context includes a second set of models, 
grammars, and lexicons. 

18-25. (Canceled) 

26. A natural language interface control system 
for operating a plurality of devices comprising: 
a first microphone; 

a feature extraction module coupled to the first 

microphone; 

a speech recognition module coupled to the 

feature extraction module; 

a natural language interface module coupled to 
the speech recognition module; 

a device interface coupled to the natural 
language interface module, wherein the natural language 
interface module is for operating a plurality of devices 
coupled to the device interface based upon non-prompted, 
open-ended natural language requests from a user; and 
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an external network interface coupled to the 
natural language interface control system; 

wherein the natural language interface abstracts 
each of the plurality of devices into a respective one of a 
plurality of grammars and a respective one of a plurality 
of lexica corresponding to each of the plurality of 
devices - 

27. The system of Claim 26 further comprising the 
plurality of devices coupled to the natural language 
interface module- 

28. The system of Claim 26 wherein the speech 
recognition module utilizes an N gram grammar. 

29. The system of Claim 26 wherein the natural 
language interface module utilizes a probabilistic context 
free grammar. 

30. The system of Claim 26 wherein the microphone 
array comprises a 3 dimensional microphone array comprising 
a planar microphone array and at least one linear 
microphone array located in a different plane in space. 

Claim 31 (Canceled) 

32. The system of Claim 26 wherein the natural 
language interface module searches for the non-prompted. 



PAGE 30/32 * RCVD AT 1/17/2006 8:25:26 PM [Eastern Standard Time] " SVR:USPTO-£FXRF-6/29 " DNIS: 2738300 * CSID: 8585520095 * DURATION (mm-ss): 09-02 



01/17/06 17:32 FAX 8585520095 



@031 



App. No. 09/692,846 
Appeal Brief 
Page 2 8 

open-ended user requests upon the receipt and recognition 
of an attention word. 

33. The system of Claim 26 wherein the natural 
language interface module context switches grammars, 
acoustic models, and lexica upon receipt and recognition of 
an attention word. 

34. The system of Claim 26 further comprising a 
grammar module for storing different grammars for each of 
the plurality of devices. 

35. The system of Claim 26 further comprising an 
acoustic model module for storing different acoustic models 
for each of the plurality of devices - 

36. The system of Claim 26 wherein the device 
interface comprises a wireless device interface. 

37. The system of Claim 26 further comprising a 
remote unit containing the first microphone array, the 
feature extraction module, the speech recognition module, 
and the natural language interface module. 

38. The system of Claim 37 further comprising a 
base unit coupled to the remote unit. 

39. The system of Claim 38 wherein the base unit 
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includes a second microphone array. 

40. The system of Claim 39 wherein the first 
microphone comprises a first microphone array r and said 
first microphone array and the second microphone array 
implement a 3 dimensional microphone array - 

41- The system of Claim 26 further comprising a 
central database coupled to said external network 
interface, said central database including at least one of 
grammars; speech models; device abstractions; programming 
information; and lexica. 

42. The system of Claim 41 wherein said central 
database is coupled to said external network interface 
through an external network. 

43. The system of Claim 42 further comprising: 

a remote server coupled to said external network 
and to 
said central database . 

44. The system of Claim 42 further comprising: 
another natural language interface control 

system; and 

another external network interface coupled to said other 
natural language interface control system, and to said 
external network. 
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